TopPI: An Efficient Algorithm for Item-Centric Mining
نویسندگان
چکیده
We introduce TopPI, a new semantics and algorithm designed to mine long-tailed datasets. For each item, and regardless of its frequency, TopPI finds the k most frequent closed itemsets that item belongs to. For example, in our retail dataset, TopPI finds the itemset “nori seaweed, wasabi, sushi rice, soy sauce” that occurrs in only 133 store receipts out of 290 million. It also finds the itemset “milk, puff pastry”, that appears 152,991 times. Thanks to a dynamic threshold adjustment and an adequate pruning strategy, TopPI efficiently traverses the relevant parts of the search space and can be parallelized on multi-cores. Our experiments on datasets with different characteristics show the high performance of TopPI and its superiority when compared to state-of-the-art mining algorithms. We show experimentally on real datasets that TopPI allows the analyst to explore and discover valuable itemsets.
منابع مشابه
Evolutionary Computing Assisted Wireless Sensor Network Mining for QoS-Centric and Energy-efficient Routing Protocol
The exponential rise in wireless communication demands and allied applications have revitalized academia-industries to develop more efficient routing protocols. Wireless Sensor Network (WSN) being battery operated network, it often undergoes node death-causing pre-ma...
متن کاملA Novel Approach for finding Frequent Item Sets with Hybrid Strategies
Frequent item sets mining plays an important role in association rules mining. Over the years, a variety of algorithms for finding frequent item sets in very large transaction databases have been developed. Therefore, a number of methods have been proposed recently to discover approximate frequent item sets. This paper proposes an efficient SMine (Sorted Mine) Algorithm for finding frequent ite...
متن کاملAn efficient hash based algorithm for mining closed frequent item sets
Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent item sets, and then forming conditional implication rules among them. Efficient algorithms to discover frequent patterns are crucial in data mining research. Finding frequent item sets is computationally the most expensive step i...
متن کاملAn Efficient Algorithm for Mining Fuzzy Temporal Data
Mining patterns from fuzzy temporal data is an important data mining problem. One of these mining task is to find locally frequent sets, In most of the earlier works fuzziness was considered in the time attribute of the datasets .Although a couple of works have been done in dealing with such data, little has been done on the implementation side. In this article, we propose an efficient implemen...
متن کاملCalculation of One-dimensional Forward Modelling of Helicopter-borne Electromagnetic Data and a Sensitivity Matrix Using Fast Hankel Transforms
The helicopter-borne electromagnetic (HEM) frequency-domain exploration method is an airborne electromagnetic (AEM) technique that is widely used for vast and rough areas for resistivity imaging. The vast amount of digitized data flowing from the HEM method requires an efficient and accurate inversion algorithm. Generally, the inverse modelling of HEM data in the first step requires a precise a...
متن کامل